Manipuri Morpheme Identification
نویسندگان
چکیده
The Morphemes of the Manipuri word are the real bottleneck for any of the Manipuri Natural Language Processing (NLP) works. It is one of the Indian Scheduled Language with less advancement so far in terms of NLP applications. This is because the nature of the language is highly agglutinative. Segmentation of a word and identifying the morphemes becomes necessary before proceeding for any of the Manipuri NLP application. A highly inflected word may sometimes consist of ten or more affixes. These affixes are the morphemes which change the semantic and grammatical structure. So the inflexion in a word plays an important role. Words are segmented to the syllables and are examined to extract a morpheme among the syllables. This work is implemented in the Manipuri words written with the Meitei Mayek (script). This is because the syllable formations are distinct comparing to the Manipuri written with Bengali script. The combination of 2-gram or bi-gram and the Standard Deviation technique are used for the identification of the morphemes. This system gives an output with the recall of 59.80%, the precision of 83.02% and the f-score of 69.52%.
منابع مشابه
Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM
A web based Manipuri corpus is developed for identification of reduplicated multiword expression (MWE) and multiword named entity recognition (NER). Manipuri is one of the rarely investigated language and its resources for natural language processing are not available in the required measure. The web content of Manipuri is also very poor. News corpus from a popular Manipuri news website is coll...
متن کاملGenetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification
This paper deals with the identification of Multiword Expressions (MWEs) in Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the Eight Schedule of Indian Constitution. MWE plays an important role in the applications of Natural Language Processing(NLP) like Machine Translation, Part of Speech tagging, Information Retrieval, Question Answering etc. Feature selection is an i...
متن کاملManipuri Chunking: An Incremental Model with POS and RMWE
This paper records the work of Manipuri Chunking by using the commonly use tool of Support Vector Machine (SVM). Manipur being a very highly agglutinative language have to be careful in selecting the features for running the SVM. An experiment is being performed with 35,000 words to check whether the POS tagged and the Reduplicated Multiword Expression (RMWE) can improve the Chunk identificatio...
متن کاملReduplicated MWE (RMWE) helps in improving the CRF based Manipuri POS Tagger
This paper gives a detail overview about the modified features selection in CRF (Conditional Random Field) based Manipuri POS (Part of Speech) tagging. Selection of features is so important in CRF that the better are the features then the better are the outputs. This work is an attempt or an experiment to make the previous work more efficient. Multiple new features are tried to run the CRF and ...
متن کاملWill the Identification of Reduplicated Multiword Expression (RMWE) Improve the Performance of SVM Based Manipuri POS Tagging?
Reduplicated Multiword Expressions (RMWEs) are abundant in Manipuri, the highly agglutinative India language. The Part of Speech (POS) tagging of Manipuri using Support Vector Machine (SVM) has been developed and evaluated. The POS tagger has been updated with identified RMWEs as another feature. The performance of the SVM based POS tagger before and after adding RMWE as a feature have been com...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012